Ensemble-based Feature Selection Criteria
نویسندگان
چکیده
Recursive Feature Elimination (RFE) combined with feature ranking is an effective technique for eliminating irrelevant features when the feature dimension is large, but it is difficult to distinguish between relevant and redundant features. The usual method of determining when to stop eliminating features is based on either a validation set or cross-validation techniques. In this paper, we present feature selection criteria based on out-of-bootstrap (OOB) and class separability, both computed on the training set thereby obviating the need for validation. The RFE method described in this paper uses a two-class neural network classifier and the ranking of features is based on the magnitude of neural network weights. This approach is compared experimentally with a noisy bootstrapped version of Fisher’s Linear Discriminant (FLD) to rank features. The techniques are extended to multi-class problems using the Error-Correcting Output Coding (ECOC) method. Experimental investigation on artificial and natural benchmark data demonstrates the effectiveness of these criteria in selecting optimal number of features and classifier complexity. Furthermore, the known location of influential features in the simulated data permits the use of ROC (Receiver Operating Curve) to demonstrate the performance of RFE.
منابع مشابه
سودمندی رگرسیونهای تجمیعی و روشهای انتخاب متغیرهای پیشبین بهینه در پیشبینی بازده سهام
مقاله حاضر به بررسی سودمندی رگرسیونهای تجمیعی و روشهای انتخاب متغیرهای پیشبین بهینه (شامل روش مبتنی بر همبستگی و ریلیف) برای پیشبینی بازده سهام شرکتهای پذیرفته شده در بورس اوراق بهادار تهران میپردازد. بهمنظور ارزیابی عملکرد رگرسیون تجمیعی، معیارهای ارزیابی (شامل میانگین قدرمطلق درصد خطا، مجذور مربع میانگین خطا و ضریب تعیین) مربوط به پیشبینی این روش، با رگرسیون خطی و شبکههای عصبی مصنوعی...
متن کاملEnsemble Classification and Extended Feature Selection for Credit Card Fraud Detection
Due to the rise of technology, the possibility of fraud in different areas such as banking has been increased. Credit card fraud is a crucial problem in banking and its danger is over increasing. This paper proposes an advanced data mining method, considering both feature selection and decision cost for accuracy enhancement of credit card fraud detection. After selecting the best and most effec...
متن کاملBi-criteria Genetic Selection of Bagging Fuzzy Rule-based Multiclassification Systems
Previously we proposed a scheme to generate fuzzy rule-based multiclassification systems by means of bagging, mutual information-based feature selection, and a multicriteria genetic algorithm (GA) for static component classifier selection guided by the ensemble training error. In the current contribution we extend the latter component by the use of two bi-criteria fitness functions, combining t...
متن کاملOptimal Feature Selection for Data Classification and Clustering: Techniques and Guidelines
In this paper, principles and existing feature selection methods for classifying and clustering data be introduced. To that end, categorizing frameworks for finding selected subsets, namely, search-based and non-search based procedures as well as evaluation criteria and data mining tasks are discussed. In the following, a platform is developed as an intermediate step toward developing an intell...
متن کاملWised Semi-Supervised Cluster Ensemble Selection: A New Framework for Selecting and Combing Multiple Partitions Based on Prior knowledge
The Wisdom of Crowds, an innovative theory described in social science, claims that the aggregate decisions made by a group will often be better than those of its individual members if the four fundamental criteria of this theory are satisfied. This theory used for in clustering problems. Previous researches showed that this theory can significantly increase the stability and performance of...
متن کاملWised Semi-Supervised Cluster Ensemble Selection: A New Framework for Selecting and Combing Multiple Partitions Based on Prior knowledge
The Wisdom of Crowds, an innovative theory described in social science, claims that the aggregate decisions made by a group will often be better than those of its individual members if the four fundamental criteria of this theory are satisfied. This theory used for in clustering problems. Previous researches showed that this theory can significantly increase the stability and performance of...
متن کامل